Offline General Handwritten Word Recognition Using an Approximate BEAM Matching Algorithm
نویسنده
چکیده
ÐA recognition system for general isolated offline handwritten words using an approximate segment-string matching algorithm is described. The fundamental paradigm employed is a character-based segment-then-recognize/match strategy. Additional user supplied contextual information in the form of a lexicon guides a graph search to estimate the most likely word image identity. This system is designed to operate robustly in the presence of document noise, poor handwriting, and lexicon errors, so this basic strategy is significantly extended and enhanced. A preprocessing step is initially applied to the image to remove noise artifacts and normalize the handwriting. An oversegmentation approach is taken to improve the likelihood of capturing the individual characters embedded in the word. The goal is to produce a segmentation point set that contains one subset which is the correct segmentation of the word image. This is accomplished by a segmentation module, employing several independent detection rules based on certain key features, which finds the most likely segmentation points of the word. Next, a sliding window algorithm, using a character recognition algorithm with a very good noncharacter rejection response, is used to find the most likely character boundaries and identities. A directed graph is then constructed that contains many possible interpretations of the word image, many implausible. Contextual information is used at this point and the lexicon is matched to the graph in a breath-first manner, under an appropriate metric. The matching algorithm employs a BEAM search algorithm with several heuristics to compensate for the most likely errors contained in the interpretation graph, including missing segments from segmentation failures, misrecognition of the segments, and lexicon errors. The most likely graph path and associated confidence is computed for each lexicon word to produce a final lexicon ranking. These confidences are very reliable and can be later thresholded to decrease total recognition error. Experiments highlighting the characteristics of this algorithm are given. Index TermsÐHandwriting recognition, OCR, BEAM search, word segmentation, machine reading, pattern recognition.
منابع مشابه
یک روش دو مرحلهای برای بازشناسی کلمات دستنوشته فارسی به کمک بلوکبندی تطبیقی گرادیان تصویر
This paper presented a two step method for offline handwritten Farsi word recognition. In first step, in order to improve the recognition accuracy and speed, an algorithm proposed for initial eliminating lexicon entries unlikely to match the input image. For lexicon reduction, the words of lexicon are clustered using ISOCLUS and Hierarchal clustering algorithm. Clustering is based on the featur...
متن کاملA Word Matching Algorithm in Handwritten Arabic Recognition Using Multiple-Sequence Weighted Edit Distances
No satisfactory solutions are yet available for the offline recognition of handwritten cursive words, including the words of Arabic text. Word matching algorithms can greatly improve the OCR output when recognizing words of known and limited vocabulary. This paper describes the word matching algorithm used in the JU-OCR2 optical character recognition system of handwritten Arabic words. This sys...
متن کاملUse of the Shearlet Transform and Transfer Learning in Offline Handwritten Signature Verification and Recognition
Despite the growing growth of technology, handwritten signature has been selected as the first option between biometrics by users. In this paper, a new methodology for offline handwritten signature verification and recognition based on the Shearlet transform and transfer learning is proposed. Since, a large percentage of handwritten signatures are composed of curves and the performance of a sig...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملTwo-Tier Approach for Arabic Offline Handwriting Recognition
In this paper we present a novel approach for the recognition of offline Arabic handwritten text that is motivated by the Arabic letters’ conditional joining rules. A lexicon of Arabic words can be expressed in terms of a new alphabet of PAWs (Part of Arabic Word). PAWs can be expressed in terms of letters. The recognition problem is decomposed into two problems that are solved simultaneously. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Pattern Anal. Mach. Intell.
دوره 23 شماره
صفحات -
تاریخ انتشار 2001